README for US Global Health Aid Policy and Family Planning in Sub-Saharan Africa data and code


# Data Availability Statement
All data used in this analysis are publicly available, but in some cases researchers must create their own accounts and obtain permission to access the data. PMA data are available through [IPUMS PMA](https://pma.ipums.org/pma/). Data on foreign aid flows and national spending on health are available from the Institute for Health Metrics and Evaluation [Development Assistance for Health Database](https://ghdx.healthdata.org/record/ihme-data/development-assistance-health-database-1990-2019) and [Global Health Spending](https://ghdx.healthdata.org/record/ihme-data/global-health-spending-1995-2018). Replication code and data access instructions are available on the Harvard Dataverse: https://doi.org/10.7910/DVN/TXPLLZ.


# Directory Structure
To replicate our analysis, you must create the following folders in your project directory: `data-raw`, `data-clean`, `output`, and `scripts`. Put all of the scripts in this repository in the `scripts` subdirectory.


# Data

## IPUMS PMA
IPUMS PMA has extensive documentation on how to access and use the data. We recommend reviewing the user guide at [IPUMS PMA User Guide](https://pma.ipums.org/pma/userguide.shtml) before starting. You must register for a free user account before you can download any data. The [IPUMS PMA Data Analysis Hub](https://tech.popdata.org/pma-data-hub/posts/2020-12-10-get-ipums-pma-data/index.html) also has resources on how to import IPUMS PMA data into R using the `ipumsr` package.

Once you have access to IPUMS PMA, create two data extracts:
* Family Planning - Person (woman-level data): see "IPUMS PMA Woman Extract.xlsx" for list of samples and variables to select
* Family Planning - Service Delivery Point: see "IPUMS PMA SDP Extract.xlsx" for list of samples and variables to select

Save the DDI file (.xml) and data file (.dat.gz) in `data-raw`.

## IHME Data
From the IHME [Development Assistance for Health Database](https://ghdx.healthdata.org/record/ihme-data/development-assistance-health-database-1990-2019), download the `DAH Database.csv` file (and we recommend downloading and reviewing their user guide and codebook as well) and save in `data-raw`.

Also download and save the `Data.csv` file from [Global Health Spending](https://ghdx.healthdata.org/record/ihme-data/global-health-spending-1995-2018) in `data-raw`.

## Other Data
We also use two other data sources in the construction of our data, the World Bank World Development Indicators data on population size and a shapefile for the African continent. Both are available in the `data-raw` subdirectory on the Harvard Dataverse: https://doi.org/10.7910/DVN/TXPLLZ.


# Scripts
There are 5 scripts to reproduce the data cleaning and analysis for this paper.

1. `functions.R`: this script loads necessary packages and defines functions that will be used in the subsequent scripts.

2. `01_calculate_exposure.Rmd`: Uses both IHME datasets and the WDI data to calculate the exposure measures. Outputs `data-clean/exposure.Rds`.

3. `02_clean_pma_woman_data.Rmd`: Imports raw PMA woman-level data and prepares for analysis by constructing key variables and making sample restrictions. Outputs `data-clean/plgha_woman_df.Rds`.

4. `03_clean_pma_sdp_data.Rmd`: Imports raw PMA SDP-level data and prepares for analysis by constructing key variables and making sample restrictions. Outputs `data-clean/plgha_sdp_df.Rds`

5. `04_run_analysis.Rmd`: Runs all analysis for the paper end-to-end and produces the tables and figures, including those in the Supplementary Materials. To replicate the full analysis, users will need to obtain access to the datasets described above. 


# Steps
1. Obtain access to requisite data.
2. Set-up directory according to the following structure:
    - scripts/
    - data-raw/
        - PMA
        - IHME
        - Other
    - data-clean/
    - output/
3. Store all raw data in `raw-data` according to the source. Store all 5 R scripts from this repository in the `scripts` directory.
4. Run scripts in sequential order: `01_calculate_exposure.Rmd`, `02_clean_pma_woman_data.Rmd`, `03_clean_pma_sdp_data.Rmd`, `04_run_analysis.Rmd`. The scripts will output clean data into `clean-data` and all figures and tables into `output`

